26 research outputs found
Mining Discriminative Triplets of Patches for Fine-Grained Classification
Fine-grained classification involves distinguishing between similar
sub-categories based on subtle differences in highly localized regions;
therefore, accurate localization of discriminative regions remains a major
challenge. We describe a patch-based framework to address this problem. We
introduce triplets of patches with geometric constraints to improve the
accuracy of patch localization, and automatically mine discriminative
geometrically-constrained triplets for classification. The resulting approach
only requires object bounding boxes. Its effectiveness is demonstrated using
four publicly available fine-grained datasets, on which it outperforms or
achieves comparable performance to the state-of-the-art in classification
Learning Rich Features for Image Manipulation Detection
Image manipulation detection is different from traditional semantic object
detection because it pays more attention to tampering artifacts than to image
content, which suggests that richer features need to be learned. We propose a
two-stream Faster R-CNN network and train it endto- end to detect the tampered
regions given a manipulated image. One of the two streams is an RGB stream
whose purpose is to extract features from the RGB image input to find tampering
artifacts like strong contrast difference, unnatural tampered boundaries, and
so on. The other is a noise stream that leverages the noise features extracted
from a steganalysis rich model filter layer to discover the noise inconsistency
between authentic and tampered regions. We then fuse features from the two
streams through a bilinear pooling layer to further incorporate spatial
co-occurrence of these two modalities. Experiments on four standard image
manipulation datasets demonstrate that our two-stream framework outperforms
each individual stream, and also achieves state-of-the-art performance compared
to alternative methods with robustness to resizing and compression.Comment: CVPR 2018 Camera Read
Learning to Detect Carried Objects with Minimal Supervision
We propose a learning-based method for detecting carried objects that
generates candidate image regions from protrusion, color contrast and
occlusion boundary cues, and uses a classifier to filter out the regions
unlikely to be carried objects. The method achieves higher accuracy than
state of the art, which can only detect protrusions from the human
shape, and the discriminative model it builds for the silhouette
context-based region features generalizes well. To reduce annotation
effort, we investigate training the model in a Multiple Instance
Learning framework where the only available supervision is "walk" and
"carry" labels associated with intervals of human tracks, i.e., the
spatial extent of carried objects is not annotated. We present an
extension to the miSVM algorithm that uses knowledge of the fraction of
positive instances in positive bags and that scales to training sets of
hundreds of thousands of instances